Multidimensional Range Queries on Modern Hardware
نویسندگان
چکیده
Range queries over multidimensional data are an important part of database workloads in many applications. Their execution may be accelerated by using multidimensional index structures (MDIS), such as kd-trees or R-trees. As for most index structures, the usefulness of this approach depends on the selectivity of the queries, and common wisdom told that a simple scan beats MDIS for queries accessing more than 15%-20% of a dataset. However, this wisdom is largely based on evaluations that are almost two decades old, performed on data being held on disks, applying IO-optimized data structures, and using single-core systems. The question is whether this rule of thumb still holds when multidimensional range queries (MDRQ) are performed on modern architectures with large main memories holding all data, multi-core CPUs and data-parallel instruction sets. In this paper, we study the question whether and how much modern hardware influences the performance ratio between index structures and scans for MDRQ. To this end, we conservatively adapted three popular MDIS, namely the R∗-tree, the kd-tree, and the VA-file, to exploit features of modern servers and compared their performance to different flavors of parallel scans using multiple (synthetic and realworld) analytical workloads over multiple (synthetic and real-world) datasets of varying size, dimensionality, and skew. We find that all approaches benefit considerably from using main memory and parallelization, yet to varying degrees. Our evaluation shows that, on current machines, the new rule of thumb for the threshold from which on scanning should be favored over parallel versions of classical MDIS should be set rather around 1%.
منابع مشابه
Optimization of Disk Accesses for Multidimensional Range Queries
Multidimensional data structures have become very popular in recent years. Their importance lies in efficient indexing of data, which have naturally multidimensional characteristics like navigation data, drawing specifications etc. The R-tree is a well-known structure based on the bounding of spatial near points by rectangles. Although efficient query processing of multidimensional data is requ...
متن کاملEfficient evaluation of partially-dimensional range queries in large OLAP datasets
In light of the increasing requirement for processing multidimensional queries on OLAP (relational) data, the database community has focused on the queries (especially range queries) on the large OLAP datasets from the view of multidimensional data. It is well-known that multidimensional indices are helpful to improve the performance of such queries. However, we found that much information irre...
متن کاملThe mQp Tree: A Multi-Dimensional Access Method based on a Non-Binary Tree
Many applications require data base systems with the capability to manage large amounts of data and to answer complex queries, especially those with several attributes, such as range queries and more particularly, spatial and temporal queries. Consequently, new access methods, with good performance for these types of operations associated with medium-high dimensionality, are needed. From this p...
متن کاملOverview of Graph Search and Beyond
Modern Web data is highly structured in terms of entities and relations from large knowledge resources, geo-temporal references and social network structure, resulting in a massive multidimensional graph. This graph essentially unifies both the searcher and the information resources that played a fundamentally different role in traditional IR, and “Graph Search” offers major new ways to access ...
متن کاملQuery-Driven Knowledge Discovery via OLAP manipulations
We study KDD (Knowledge Discovery in Databases) processes on OLAP (multidimensional and multilevel) data from a query point of view. Focusing on association rule mining, we consider typical queries to cope with the pre-processing of multidimensional data and the post-processing of the discovered patterns as well. We use a model and a rule-based language stemming from the OLAP representation and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1801.03644 شماره
صفحات -
تاریخ انتشار 2018